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Abstract. We present an improved finite-size scaling method for reliably extracting 
the critical temperature Tbkt of a Berezinskii-Kosterlitz-Thouless (BKT) transition. 
Using the known Webcr-Minhagen multiplicative logarithmic correction to the spin 
stiffness ps at Tbkt and the Kosterlitz-Nelson relation between the transition 
temperature and the stiffness, Ps{Tbkt) = 2Tbkt/7'', wc define a size dependent 
transition temperature TBKT(ii,-^2) based on a pair of system sizes Li,L2, e.g., 
L2 ~ 2Li. We use Monte Carlo data for the standard two-dimensional classical XY 
model to demonstrate that this quantity is well behaved, rapidly convergent, and can be 
reliably extrapolated to the thermodynamic limit, Li, L2 ^ 00. Using GPU (graphical 
processing unit) computing, we obtain high-precision data for L up to 512 and extract 
a transition temperature Tbkt = 0.89274(1), where the statistical error ±1 in the last 
digit is about 6 times smaller than that of the best previous estimate. 



PACS numbers: 64.60.De, 64.60.Bd, 64.60.fd 
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1. Introduction 



The Berezinskii-Kosterlitz-Thouless (BKT) transition [H Ej [3] is very well understood 
in terms of its physical mechanism of vortex-antivortex unbinding. The field-theoretical 
formulation of this two-dimensional (2D) problem of an U(l) symmetric order parameter 
gives a rigorous quantitative characterization of the transition into the critical ( "quasi- 
ordered") state obtaining below Tkbt- Despite this detailed understanding, analyzing 
numerical data (normally from Monte Carlo simulations) for the BKT transition on finite 
lattices still is considered a challenge [H 13 O [71 |9l [TOl [11] , because of the presence of the 
Weber-Minhagen (WM) logarithmic finite-size corrections [12] to Tbkt and higher-order 
corrections to it [TT] . 

We here propose an improved procedure to to extract the BKT transition 
temperature in the thermodynamic limit using a finite-size definition of Tbkt(-Z^i, -^^2) 
based on the spin stiffness (helicity modulus) ps for a pair of system sizes. This 
generalizes the "curve crossing" method often used when analyzing dimensionless 
quantities at conventional phase transitions [13], and also at the BKT transition [9| [TT]. 
but is even more powerful because we have combined it with the Nelson-Kosterlitz (NK) 
criterion [Tl] for the discontinuity of the stiffness in the thermodynamic limit, 

^^(-Tbkt) = • (1) 

TT 

Imposing the constraint also for finite size, by adjusting the unknown constant appearing 
in the WM logarithmic correction, reduces the statistical fluctuations very significantly 
and leads to a well-behaved finite-size extrapolation. 



1.1. Logarithmic corrections 

WM derived the following logarithmic finite-size correction to the spin stiffness exactly 
at the transition temperature [T2] : 

P,(Tbkt, L) = Ps{Tbkt, 00) 1^1 + 2in(L) + c ) ' 

where L is the linear system size and C an unknown constant which depends on the 
microscopic details of the system under study. We illustrate the slow convergence 
in Fig. [1] by plotting raw Monte Carlo results for ps for the classical 2D XY model 
(we will describe the calculations below in Sec. [5]) for different system sizes versus the 
temperature. 

Previous approaches to use the WM correction in finite-size extrapolations of Tbkt 
have typically attempted to find the best value of C to fit a series of finite-size data [7] , 
or by dividing out the factor containing the logarithm, with C chosen such that curves 
graphed versus the temperature for different system size cross each other within as 
narrow a range of T as possible (with the crossing points for large lattices approaching 
the BKT temperature) [9]. With the log-correction divided out, curves for different 
system sizes graphed versus T can also be scaled to collapse onto each other remarkably 
well by using the known exponential divergence of the correlation length [13] . 
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Figure 1. Monte Carlo results for the spin stiffness of the 2D classical XY model 
for several lattice sizes of the form L = 2". A discontinuity develops at Tbkt when 
L oo, at a point satisfying the NK relation, Eq. ([T]), indicated here by the line 
Ps = 2T/tt. The vertical line is the actual transition temperature Tbkt ~ 0.89274 (as 
determined in this paper) of the model. Thus, the stiffness discontinuity for L ^ oo 
is located at the intersection of the two lines. 



We here take a different route to finite-size extrapolations of Tbkt, by letting also C 
be size-dependent. For a pair of system sizes Li, L2, we define C{Li, L2) and consider the 
stiffness of the two sizes at arbitrary temperature T with the multiplicatice correction 
in Eq. ([2]) divided out; 



determine C(Li, L2) and define a size-dependent transition temperature Tbkt(-Z^i, L2) by 
demanding that the normalized stiffness p*[Tbkt(-^^i, L2), Ln] for both n = 1 and n = 2 
satisfies the NK relation ([T]). Thus, when plotted in the plane {ps,T), the quantities 
for n = 1 and n = 2 will coincide at a point — in practice a point where the two curves 
cross each other — and this point is also exactly crossed by the NK line ps = 2T/tx. We 
illustrated the method with actual Monte Carlo data in Fig. [2j The size-dependence 
of the constant C, chosen to enforce the NK relation, changes the nature of the size 
corrections to the transition temperature. In Sec. [3] we will analyze the drift of the 
crossing point exemplified in Fig. |2] as a function of the system size with Li = L and 
L2 = L/2 and show that the size dependent temperature Tbkt{L, L/2) converges very 
rapidly, almost linearly, with increasing L. It can be extrapolated to the thermodynamic 
limit using a low-order polynomial. We will also investigate the size dependence of the 
constant C{L,L/2) in the logarithmic correction. 




Ps{T,oo) and we can uniquely 



n = 1,2. 



(3) 
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Figure 2. Illustration of the fitting procedure based on two system sizes; here with 
Li = 16 and L2 = 32. The constant C = C{Li,L2) in the logarithmic finite-size 
correction to the stiffness in Eq. ([3]) has ben chosen such that the two p* curves [where 
the log-correction in Eq. © has been divided out] cross each other exactly at the 
temperature satisfying the NK criterion, Eq. ([1]). In this case C(16, 32) = 1.271. 



1.2. Stiffness renormalization 

An interesting complication for finite-lattice calculations of ps was noted some time ago 
by Prokof'ev and Svistunov [15]: For a system on a torus (i.e., with periodic boundary 
conditions in both directions of the 2D square lattice), the stiffness measured in the 
standard way in simulations [in the case of the classical XY model using Eq. (I7j) in 
Sec. [2] does not give ps exactly. It is affected by a normalization factor depending on 
the aspect ratio R = L^/Ly of an x Ly lattice. This is because the derivation of (JTj) 
based on imposing a twist (see, e.g., Ref. [13]) assumes that there is no net flux field 
threading the torus, while in fact such "field quanta" are thermally excited in the the 
torus at any finite temperature, and they renormalize the stiffness in two dimensions 
(but there is no such effect in three dimensions). In the limit — )■ 00, Ly — j- 00, the 
stiffness measured according to (JTj) in the x and y direction is related to the stiffness ps 
appearing in the BKT action and in Eqs. (JT]) and ([2]) according to; 

Px = fx{R)ps, Py = fy{R)ps, (4) 

where fx 7^ fy unless R = 1 and fx I, fy ^ when R 00. 

Fortunately, the renormalization factors fx, fy due to the thermally excited flux 
quanta can be easily computed numerically (and in a special case analytically in terms 
of Ramanujan's 0-function [7]); a list for selected R is given in Ref. [7]. Here we will 
use the aspect ratio R = 1, for which fx = fy = 0.999825 [15]. As previously noted in 
Ref. [7] , Monte Carlo calculations of Tbkt have in the past typically not reached the level 
of precision where this factor would play any role (for R = 1, which is normally used) , but 
in high-precision calculations the renormalization should be included in order to avoid 
a systematical error. Our calculations here are at the level where the renormalization 
must be taken into account, as was also done in a recent large-scale study [TO] . 
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2. Monte Carlo calculations 



We use standard Monte Carlo methods implemented using GPU computing to calculate 
the stiffness (helicity modulus) for the standard classical XY model (two-dimensional 
vectors of size S = 1) with energy 

H = -Y,s,-s, = -Y, cos(e, - e,), (5) 

where the expression in terms of the angles Gj is more convenient in practice. We here 
first discuss the definition of the helicity modulus and then discuss some details of the 
Monte Carlo algorithms and their GPU implementation. 



2.1. The helicity modulus 

The helicity modulus is defined according to 
1 



pa 



(6) 

</>=0 



where is the free energy in the presence of a twist field (or, equivalently, a twisted 
boundary condition) in the lattice direction a {a = x,y). The Monte Carlo estimator 
for this quantity is given by 

Pa = j.({Ha)-^{ll)), (7) 



L2 r 

where Ha is the energy including only the a-directed links (nearest-neighbor site pair) 
in (|5]) and la is the "current" in the a direction, given by 

4 = -5];sin(e,-0,). (8) 

A pedagogical derivation of these expressions can be found in Ref. [13]. Note again 
that the stiffness constant appearing in the BKT action is related to the helicity moduli 
Pxi Py (where we use = Ly so that = Py) by the renormalization factors in (jl]). 



2.2. GPU computing 

Here we summarize our Monte Carlo simulations on the GPU, which we have 
implemented using the NVIDIA CUDA framework. We refer interested readers to 
available literature for an introduction to the details of the GPU hardware and the 
programming models |16] . 

We use parallel Metropolis single-spin flips as well as over-relaxation moves [TTf ITS] . 
In addition, to improve the dynamics, and for convenience when computing stiffness 
values for a range of temperatures close to the transition, we run several temperatures 
and apply parallel-tempering (PT) [19], where configurations for nearby temperatures 
are occasionally swapped (using the Metropolis acceptance probability). One Monte 
Carlo step (MCS) is then defined as one Metropolis sweep, an over-relaxation sweep of 
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Figure 3. Mapping of a 128x128 lattice to thread blocks on GPU. Each thread block 
of 16x16=256 threads performs updates on 32x32=1024 spins. 

the entire lattice, followed by one parallel-tempering exchange attempt for each pair of 
adjacent temperatures. 

To implement the parallel Metropolis and over-relaxation updates in a way suitable 
for the GPU, we divide the entire lattice into blocks of 32 x 32 = 1024 spins. Each block 
is decomposed into two different sub-lattices, as shown in Fig. [31 Each block is assigned 
to a thread block [16] containing 16 x 16 = 256 threads, which execute the same GPU 
kernel in parallel [I6]. Each thread is responsible for updating 2x2 = 4 spins, with 
two "black" sites and two "white" sites, so that there are enough arithmetic operations 
to hide the latency of the global memory accesses [16]. We apply the checkerboard 
decomposition algorithm to perform the Metropolis single-spin flips in parallel [20| [2T]. 
We first update all the black spins in parallel via a GPU kernel. After all the black spins 
belonging to different blocks are updated, another kernel is launched to update all the 
white spins. 

Due to the special architecture of the GPU, the commonly used Mersenne- Twister 
(MT) random number generator (RNG) can not be efficiently implemented at the thread 
level. Instead, we use a faster RNG implementation specially designed for the GPU 
architecture, the Warp Generator [22j. We note that although it has a smaller period 
of 2^°^^ — 1 than MT (2^^^^^ — 1), we do not find any noticeable statistical bias compared 
with the CPU runs using the MT. 

It is well established that the single-spin flip Metropolis update suffers from critical 
slowing down near phase transitions and for increased efficiency one has to resort to 
cluster updates [231 l2l|- However, GPU implementations of the cluster update are 
complicated and less efficient [25]. We instead implemented the microcanonical over- 
relaxation update [m [18] and found it to be as efficient as the cluster update in reducing 
slowing-down. It should also be noted that slowing-down is not very serious at the BKT 
transition compared to standard critical points. 



Finite-size scaling method for the Berezinskii-Kosterlitz- Thouless transition 



7 



In an over-relaxation move, the new spin direction on site i is obtained by reflecting 
it with respect to its local molecular field, Hj = — ^31 according to 

S: = -S. + 2^H,. (9) 

This update maps the system from a point in the phase space to another point with 
exactly the same energy. After several sweeps, the system is able to explore a larger 
region of the phase space without being stuck at a particular local minimum for a long 
time, thus improving the ergodicity of the simulation. 

To better equilibrate the simulations and further reduce slowing-down effects 
close to the transition, we also perform PT sweeps [19] on many systems at different 
temperatures simulated simultaneously. After a certain number of MCSs (often just 
one), we swap two adjacent configurations Xm,X„ at neighboring temperatures Tm,T„ 
with the acceptance probability of 

iy(X^,T„|X„,T„) = min [1, e(i/'r™-i/T^")(^"^-^")] , (10) 

where En is the total energy of replica n. 

To reduce the amount of data transfer between the CPU and the GPU, we store 
all the spin configurations at different temperatures in the GPU global memory, and 
all updates are performed through the kernel functions on the GPU. Measurements 
are also performed on the GPU and the results are sent back to the CPU for binning. 
Simulations were carried out at 21 temperatures ranging from T = 0.888 to T = 0.898 
for system sizes ranging from L = 16 to L = 512 in steps of 16 (to keep optimal sizes for 
the GPU memory structure, as illustrated in Fig. |3]). In each simulation, about 1.2 x 10^ 
measurements were made after 10^ MCSs for equilibration. The data were blocked into 
bins of 10^ measurements, which were subject to further analyses post-simulation. The 
simulations were performed on Tesla C2090 CPUs, and took approximately 3600 GPU 
hours for producing the whole data set discussed in this paper. 

We also used standard CPUs with single-spin and cluster updates for small systems. 
For the range of systems where we have results from both CPU and GPU calculations, 
they agree perfectly within statistical errors. 

3. Results 

We here use system pairs of the form (L, L/2) and extract crossing points such as the one 
shown in Fig. |2l Note again that the constant C in Eq. (|3]) depends on L but for large 
L it must converge to an asymptotic value (which depends on the microscopic details of 
the model). We here first discuss a few more details of the fitting procedures and then 
study the convergence properties of the transition temperature and the constant C. 

3.1. Fitting procedures 

To systematically carry out the analysis illustrated in Fig. |2l we fit a polynomial 
(typically of second-order) to a range of Monte Carlo data for the two system sizes 
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Figure 4. The finite-size transition temperature extracted on the basis of system-size 
pairs (L/2,L) versus 1/L. The curve is a second-order polynomial fitted to all the 
data points, giving the infinite-size extrapolated value of the transition temperature 
Tbkt = 0.89274(1), where the number within parenthesis indicates the statistical error 
of the last digit. The lower panel shows is a zoom- in of the large-size data. 

close to the transition. The crossing point is extracted using the polynomials, and the 
deviation of the stiffness constant from the NK value (including the renormalization 
factor discussed in Sec. I1.2p is recorded. A sweep of such fitting procedures is carried 
out to minimize the NK deviation, which can be done to machine precision. Error bars 
are computed by repeating this procedure for a large number (hundreds) of bootstrap 
samples of the data. 

3.2. Transition temperature 

Fig. m shows our results for Tbkt(-^7 -^^/2), for system sizes in the range L = 4 to 512. 
A second-order polynomial gives a statistically sound fit to all the data, and removing 
small system sizes does not affect the extrapolated L — t- oo value. The size dependence 
is weak and essentially linear, with a very small qudratic correction required when 
including small sizes in the fit. Naturally, the standard deviation of the extrapolated 
result increases as the data set becomes smaller. For example, including all the data 
points starting from L = 4 we obtain Tbkt = 0.89274(1), where the number within 
parenthesis is the standard deviation of the preceding digit. Starting instead from 
L = 32 we obtain Tbkt = 0.89275(2). These numbers agree within statistical errors 
and the quality of the fit is very similar in the two cases. There is, thus, no reason to 
exclude the small sizes, and we present Tbkt = 0.89274(1) as our final result for the 
transition temperature. To our knowledge, the best previous result, obtained recently 
in a large-scale GPU study with system szies up to L = 16384 was Tbkt = 0.89289(6), 
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Figure 5. Comparison of our finite-size estimates of Tbkt (witlr the second-order 
polynomial fit shown as well) and those of Ref. [TU] (where the lines between data 
points only provide a guide to the eye). 

which deviates from our result by about 2.5 standard deviations, i.e., the calculations 
are marginally consistent with each other. 

To further demonstrate how our calculation leads to a significantly more precise 
value (reducing the statistical error by a factor of 6), even though much smaller systems 
were used, we compare our finite-size data directly with those of Ref. [10] in Fig. |5l Note 
that the finite-size definitions of Tbkt are different in the two calculations, and that the 
data of Ref. [10] were originally analyzed in a different way, with a logarithmic correction 
(adjusted for the best fit to the data) to the infinite-size Tbkt- The comparison is still 
very illuminating. It is clear that our definition has much smaller size corrections, and 
also that the corrections in the data of [10] cannot be captured by a polynomial (hence 
the different method of analysis in Ref. [ID])- The statistically marginal agreement 
between the two extrapolations could be due to neglected higher-order logarithmic 
corrections neglected in Ref. [TD] (which, we have argued, are taken into account 
implicitly in our work by the size- dependent constant C). 

3.3. The constant C 

The very rapid, asymptotically linear convergence of our finite-size data suggests that 
the method of letting the constant C in Eq. ([2]) be size-dependent effectively captures 
(or eliminates) higher-order logarithmic corrections [H] to the WM form. It is then 
also interesting to examine the convergence of C. The results are shown in Fig. |6l 
Here the results cannot be well described by a low-order polynomial when small system 
sizes are excluded. All data points can be fitted to the form a + h ■ e~'^^(l + dx + ex^), 
where x = 1/L. This is the form shown in the figure. For extrapolating to infinite 
size, a quadratic polynomial in the range where it works gives the same result as the 
exponential form above; C = 2.12(3). 
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Figure 6. Size dependence of the constant C in the WM logarithmic correction used 
in Eq. ([3]), computed using system-size pairs (i/2, L). The continuous curve is a fit of 
the form discussed in the main text. The upper panel shows all the data used in the 
analysis and the lower panel focuses on the larger systems. 



4. Summary 

We have presented an improved finite-size scaling method for studying the BKT 
transition. Taking advantage of the NK relationship governing the spin stiffness at 
Tbkt in the thermodynamic limit, we defined a two-size estimate (curve-crossing) for 
the transition temperature which is constrained by this relationship also for finite size. 
We tested the procedure for the standard 2D XY model, with high-precision finite-size 
data obtained by Monte Carlo simulations on CPUs on system sizes up to 512 x 512. The 
results exhibit a very rapid, almost linear convergence of Tbkt? and the extrapolated 
value Tbkt = 0.89274(1) is, to our knowledge, the most precise estimate so far for this 
model. We anticipate that our scaling method should be the preferred way to extract 
Tbkt in general. 

Note that the form (|2]) of the logarithmic correction is asymptotically valid only 
exactly at T = Tbkt [12] , but that does not invalidate our use of it at any temperature 
for the purpose of a procedure to analyze finite-size data. Since the crossing points 
tend to Tktb in the thermodynamic limit, the assumptions underlying the use of ([3]) in 
combination with the NK criterion are guaranteed to hold asymptotically. Whether 
or not the actual higher-order corrections to ([2]) can be described well with a size 
dependent C (i.e., the remaining size dependence of Tbkt is weak), the procedure is 
unbiased in the limit L — )■ oo. The fact that results for Tbkt depends only weakly 
on the system size, and can be described by a low-order polynomial, strongly suggests 
that the size dependent C does capture the higher-order corrections very well. While 
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we have not formally proved that the size- dependent constant C (in combination the 
NK relationship) eliminates logarithmic corrections, the fact that our data can be well 
described with a very simple form without logarithms strongly suggests it. It would 
be interesting to test these numerical results also based on the renormalization-group 
treatment of the BKT transition. 
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